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Accelerating  Algorithms 

Reconfigurable  hardware  used  to  accelerate 
image  and  signal  processing  algorithms 

Exploit  parallelism  for  speedup 

Customize  design  to  fit  the  particuiar  task 

-  signals  in  fixed  or  floating-point  format 

-  area,  power  vs.  range,  precision  trade-offs 
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Goal:  use  optimal  format 
For  every  value 
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General  Floating-Point  Format 
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BIAS  depends  on  number  of  exponent 
bits 

»  127  in  IEEE  single  precision  format 

Implied  1  in  mantissa  not  stored 


Library  of  Parameterized  Modules 


Total  of  seven  parameterized  hardware  modules 
for  arbitrary  precision  floating-point  arithmetic 


format  control 


operators 


conversion 


Module  Latency 
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rnd_norm  2 

fp_add  4 

fp_sub  4 

fp_mul  3 

fix2float  4/5 

float2fix  4/5 


Highlights 

Completely  general  floating-point  format 

All  IEEE  formats  are  a  subset 

All  previously  published  non-IEEE  formats  are  a 
subset 

Abstract  normalization  from  other  operations 
Rounding  to  zero  or  nearest 
Pipelining  signals 
Some  error  handling 


Assembly  of  Modules 
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Denormalization 


“Unpack”  input  number:  insert  implied  digit 

If  input  is  value  zero,  insert  ‘0’ 

Otherwise,  insert  ‘1’ 

Output  1  bit  wider  than  input 

Latency  =  0 
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Rounding  and  Normalizing 

•  Returns  input  to  normalized  format 

•  Designed  to  follow  arithmetic  operation(s) 

IN1  READY  EXCEPTIONJN  ROUND 


OUT1  DONE  EXCEPTION_OUT 


Addition  and  Subtraction 

OP1  OP2  READY  EXCEPTION 


Multiplication 
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Fixed  to  Floating-Point 
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Implementation  Experiments 

Designs  specified  in  VHDL 

Mapped  to  Xilinx  Virtex  FPGA 

Wildstar  reconfigurable  computing  engine  by  Annapolis 
Micro  Systems  Inc. 

-  PCI  Interface  to  host  processor 

-  3  Xilinx  XCV1000  FPGAs 

-  total  of  3  million  system  gates 

-  40  Mbytes  of  SRAM 

-  1 .6  Gbytes/sec  I/O  bandwidth 

-  6.4  Gbytes/sec  memory  bandwidth 

-  clock  rates  to  100MHz 


Synthesis  Results 


Total  bitwidth 


Synthesis  Results 


K-means  Algorithm 

Image  spectral  data 


□  class  0 

□  class  1 
■  class  2 

□  class  3 
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•Each  cluster  has  a  center: 

-  mean  value  of  pixels  in  that  cluster 

•Each  pixel  is  in  the  cluster  whose  center  it  is  closest  to 

-  requires  a  distance  metric 
•Algorithm  is  iterative 


Hardware  Implementation  of  k-means  Clustering 


K-means  Clustering  Algorithm 


Purely  fixed-point 


Hybrid  fixed  and  floating-point 
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Structure  of  the  K-means  Circuit 
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INPUTS 

Pixel  and  Cluster  Center  Data 
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Fixed-point  format 
12  bits,  unsigned 


Floating-point  format 
exp_bits=5 
man_bits=6 
(12  bits  total) 


Fixed-point  format 
1 6  bits,  unsigned 
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Results  of  Processing 


Purely  fixed-point 


Hybrid  fixed  and 
floating-point 


Synthesis  Results 


Property 

Fixed-point 

Hybrid 

Area 

9420  siices 

1 0883  siices 

Percent  of  FPGA 

76% 

88% 

Minimum  period 

16ns 

20ns 

Maximum  frequency 

64MHz 

50MHz 

Throughput 

1  cycie 
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Conclusions 


Library  of  fully  parameterized  hardware  modules  for 
floating-point  arithmetic  available 

Ability  to  form  arithmetic  pipelines  in  custom  floating-point 
formats  demonstrated 

Future  work 

-  More  floating  point  modules  (ACC,  MAC,  DIV  ...) 

-  More  applications 

-  Automation  of  design  process  using  the  library 

•  Automatically  choose  best  format  for  each 
variable  and  operation 


